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Abstract 

The large discrepancy between the values of the free energy for DNA dinucleotides (or dimers) mea- 
sured by different teams has raised a yet unsettled debate. Here the free energy is fitted by a three 
parameter empiric formula derived in the framework of the crystal basis model of genetic code. Sum rules 
are derived and compared satisfactorily with the data. On the basis of theoretical and phenomenologi- 
cal arguments, a relation between the correlation functions of dimcr distribution and the free energy is 
assumed. From consistency conditions, sum rules are derived. A check of these conditions with different 
samples of experimental data is performed, allowing us to argue on the reliability of the different sets of 
experimental data. 
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1 Introduction 



The importance of the computation of the free energy AG and enthalpy AH for DNA dinucleotides or 
dimers was recognized in the eighties by many authors and several experimental measures have been per- 
formed. The experimental values however range in an unacceptable wide range. A few years ago SantaLucia 
has performed an accurate analysis and comparison of the data from seven laboratories (see Table El 
taken from ref. pQ, where we have replaced the original values of the column Benight Jl] with the more 
recent ones ^3]), reaching the conclusion that six of the studies were actually in agreement and providing 
explanations for the discrepancies. In an attempt to settle by thermodynamics arguments the controversy, 
Miramontes and Cocho [2] have analysed quite recently the same set of data by assuming a relation between 
the correlation function of the dimers and their free energy, reaching the conclusion that the most reliable 
set of values is just the one which was excluded by SantaLucia. Indeed in ref. [2] a linear relation between 
the correlation function for the dimer and the corresponding free energy was postulated, which allowed 
these authors to determine which set of experimental data was in better agreement with the postulated 
relation. A shortcoming of this analysis is that the sum of the free energies for strong dimers does not 
satisfy an identity derived from the postulated equation. The purpose of this work is to come back to this 
controversial question. First, we propose a theoretical formula to compute the free energy, from which sum 
rules are derived and compared with the values of experimental data. Second, we motivate the assumption 
of a relation between the correlation function and the free energy, different from the one assumed in |2J, 
which satisfies trivial identities required by the definition of the correlation functions. We make several 
consistency checks and we try to determine the reliability of the experimental values, comparing with the 
calculated values of the correlation matrix in |2J. 

2 Fit for the free energy 

Let us recall that a mathematical framework was proposed [3], in which the codons appear as composite 
states of nucleotides. The four nucleotides being assigned to the fundamental irreducible representation of 
the quantum group ZY 9 (sZ//(2) © sly (2)) in the limit q — > (the indices H and V distinguish the two sl(2)), 
a sequence of N nucleotides is described by a pure state in the iV-fold tensor product of the fundamental 
representation. In particular, dimers or dinucleotides are obtained as the two-fold tensor product, the 
labels specifying the irreducible representation to which they belong being given in Table El In ref. [2] 
we have fitted old experimental data of the free energy AG® 7 (for simplicity we will omit the temperature 
label in the following) for RNA dinucleotides with a 4 parameter formula built up with the generators of 
Uq->o(sljj(2) © sly(2)) and in [I] the more recent data of [3] have been fitted with the following 2 parameter 
formula 

AG = a + P(C H + C v )J 3H (1) 

where J 3 x (A = H or V) stands for the diagonalized sl{2)x generator and Cx is the Casimir operator 
of U q ^Q(sl(2)x) for the considered dimer ij. Let us recall that the Casimir operator eigenvalue in the 
J-representation is J(J + 1). In order not to overload the notation, here and in the following, we will not 
explicitly write the labels of the dimer, if not necessary to identify a specific dimer. 

Here we propose for the DNA dinucleotides a 3 parameter formula, which is a generalisation of eq. lfl|): 

AG = a + at J m + a 2 ( J 3 y) 2 (2J 3H + 1) (2) 
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This equation leads to the following theoretical values of the dimer free energies AG in terms of the 
parameters ao, ot\, a 2 : 



AA/TT ao — «i — «2 


CT/GA 


ao + «2 


AT/TA «o — a i 


GA/CT 


ao + a2 


TA/AT «q ~~ a i 


CG/GC 


a + ai 


CA/GT a 


GC/CG 


ao + ai 


GT/CA a 


GG/CC 


a o + a i + 3a2 



A best-fit procedure allows one to evaluate these parameters as follows: 

00 = ^(14^1+4^2-6^3), a 1 = ^(4N 1 + 2W 2 -10N 3 ) , a 2 = ^ (-6JV1 - 10N 2 + 15JV 3 ) (3) 

where (we specify by a couple of indices the free energy of a dinucleotide) 

JVi = AG° GG + AG° CG + AG° GC + AG° GT + AG° GA + AG^ r + AG^ A + AG° TA + AG Ar + AG° AA 
N 2 = AG GG + AG GG + AG CG -AG AA -AG AT -AG° TA 

N 3 = 3AG GG + AG GT + AG GA - AG° AA (4) 
Hence we get for the different studies, see Table the best-fit values of the parameters ao, ai, 0.2'- 





Gotoh 


Vologodskii 


Breslauer 


Delcourt 


SantaLucia 


Sugimoto 


Unified 


Benight 




□ 


HI 





M 


urn 


HU 


na 


m 


a 


0.98 


1.37 


1.89 


1.24 


1.53 


1.71 


1.47 


1.35 


ai 


0.70 


0.60 


0.99 


0.61 


0.66 


0.81 


0.73 


0.54 


— a 2 


0.14 


0.12 


0.18 


0.09 


0.15 


0.16 


0.14 


0.03 


s 2 


0.0015 


0.0011 


0.1577 


0.0014 


0.0114 


0.0199 


0.0070 


0.0069 


x 2 


0.0243 


0.0099 


1.0001 


0.0167 


0.0753 


0.0992 


0.0821 


0.0590 



The last two rows correspond to the square mean deviation s 2 = jjYKVexp ~ Vth) 2 (N is the number of 
points, here 10) and to \ 2 = ^2{lJexp — Vth) 2 /yth- Evaluation of the incomplete Gamma function, which 
is an estimate of the goodness-of-fit, shows that the fit is good with a confidence level greater than 95%. 
Table gives the fitted absolute values for dimer free energy parameters AG corresponding to the different 
samples. From an inspection of the values of s 2 and x 2 , one sees that eq. (J2J) is well fitted by the different 
sets of experimental data, except by the ones from Breslauer. 



3 Sum rules 

We derive from eq. © a set of identities and sum rules. First, it is clear that 



In particular we get 



AG% = AG% and £ AG% = £ AG% (5) 

j=A,C,G,T j=A,C,G,T 



^2 ^G°Cj = Yl AG °Gj = 4a + 2a!+4a 2 (6) 

j=A,C,G,T j=A,C,G,T 

£ AG% = £ AG° Tj = 4a -2a! (7) 

j=A,C,G,T j=A,C,G,T 

AG°- = 16a + 8a 2 (8) 

i,j=A,C,G,T 
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In Table ^ we report the experimental values computed using the values of Table H3 Note that in ref. [2] 
the existence of the sum rules eqs. © and Q was already remarked, but the two equations should have 
the same values, which is actually not the case. 



Table 1: Experimental values of the sums of free energies [see eq. (J5J)]. 





Gotoh 


Vologodskii 


Breslauer 


Delcourt 


SantaLucia 


Sugimoto 


Unified 


Benight 




4.72 


6.16 


9.18 


5.78 


6.72 


8.10 


6.74 


6.54 


J2i AG Gi 


4.77 


6.20 


8.11 


5.80 


6.94 


7.40 


6.82 


6.26 


J2i AG Ti 


2.55 


4.27 


5.63 


3.68 


5.08 


5.30 


4.33 


4.45 


Ei ag° , 


2.51 


4.21 


5.33 


3.74 


4.51 


5.10 


4.60 


4.27 



Due to the complementarity rule, one has 

E AG °c>= E 

i=A,C,G,T 



AG^ 



i=A,C,G,T 



V AG At = £ AG? 



i=A,C,G,T 



iU 



i=A,C,G,T 

Now we derive also news sum rules 



AG° CG + AG! 



and 
and 



E 

i=A,C,G,T 

E 

i=A,C,G,T 



E 

i=A,C,G,T 

E 

i=A,C,G,T 



AG° C 



AG 



iA 



TA 



AG° GC + AG° 



TT 



2 AG° G 
2 ^G^p^i 



2AG° AC 
2AG GA 
2AG AG 



(9) 
(10) 



(11) 
(12) 

(13) 



AG< cc + AG% — 2 AG° c 

We report in Table [21 a comparison with the experimental data, making an average of the different experi- 
mental values, theoretically equal due to eq. (j2J, i.e. 



51 — AG CG + AGjn A + AG GC + AG^ T - AG° G - AG GT - AG° AC - AG GA 

52 = AG GG + AGj<t + AGq G + AG^ — AG GT — AG^c ~ AG^ G — AG GA 



(14) 
(15) 



Table 2: Sum rules for free energies [see eqs. I|14 |) -(|15 |l ]. 





Gotoh 


Vologodskii 


Breslauer 


Delcourt 


SantaLucia 


Sugimoto 


Unified 


Benight 


Si 


-0.07 


0.08 


2.19 


0.10 


-0.09 


0.50 


0.09 


-0.32 


S2 


-0.22 


0.24 


3.30 


-0.14 


0.34 


0.60 


0.52 


0.36 



As it can be seen the sum rules are reasonably well satisfied, except for the data of Breslauer. However we 
cannot make any statement on the reliability of the different experimental data on the basis of the accuracy 
by which they fit our empirical formula eq. (|2"|). 

4 Dinucleotide distribution 

In order to settle on more theoretical ground our analysis, we consider the dimer correlation function. In 
[2j the dimer distribution was characterized by the correlation function 

Tjj = fij - fifj (16) 
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where the labels i,j denote the nucleotides, i,j G {A, C, G, T}, and fi (fij) denote the frequency of the i 
nucleotide (ij dinucleotide) . From eq. (|16|) . it follows 

E ^ = E ^ = ( 17 ) 

i=A,C,G,T j=A,C,G,T 

In [2j the following relation between Ty and the free energy AG was assumed: 

T ij = a + bAG° j (18) 

where a and b are biological species dependent parameters. Inserting eq. (J2J) into eq. 1)17)1 one gets the 
identity 

4a + 6 ^ AG°- = => E AG ?? = cons ^ for a11 * ( 19 ) 

j=A,C*,G,T j=A,C,G,T 

In ref. [2J, from the data reported in Table E3 except the last column which was not considered, the authors 
show that eq. ()19|) was satisfied by the weak dimers only, i.e. with label % G {A,T}. Let us remark: i) that 
the statistical mechanics motivation which led the authors to postulate eq. (J2J) holds for an isolated system, 
which is not the case for a dimer inserted in a DNA strand; ii) the computed values of the correlation matrix, 
see Table 3 of !.2,, for the same biological species, show, in many cases, a much larger variation than the 
corresponding variation of the free energy, changing the ij dimer; iii) our empirical formula eq. (J2J) predicts 
the dimers ij and ji to have the same free energy, which is approximately true (see Table IHJ) , while on the 
contrary the correlation function r« is generally non symmetric. From the above remarks we assume the 
following relation between Tij and AG^-: 

T id = a + b (aG% - i E ( AG °k* + AG %) ) + C 1 " <%) h H ( 2 °) 
^ k=A,C,G,T ' 

where hij are biological species dependent real coefficients. The complementarity implies that the coefficients 
hij and hp are equal for two complementary dimers ij (from 5' to 3') and ji (from 3' to 5'), so there is only 
8 coefficients hij. 

The corrective term in the free energy can be considered as a "penalty" due to the interaction of the 
nucleotides of the dimer with the two nearest neighbour nucleotides in the strand, assumed uniformly 
distributed. 

Since the correlation coefficient has to satisfy the sum rule 1)1 7jl by definition, one is led to the constraints 
(Vj) 

= 4a + b £ (AG°. - AG$ 4 ) -- E AG °* + E ^ ~ W h v 

i=A,C,G,T k,i=A,G,G,T i=A,C,G,T 

= Aa + b Yl ( AG %- AG °ij) ~ 7 E AG °* + E (!-%)^' ( 21 ) 

i=A,C,G,T k,i=A,C,G,T i=A,C,G,T 

Eqs. ©-© imply for any pair (i,j) of nucleotides 

2b (2a + q 2 ) - 4a = E ( X " ^ = E ^ " ^ ( 22 ) 

k=A,C\G,T k=A,C,G,T 
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As eq. (|22jl gives 4 independent relations, we are left with 4 parameters hij. We remark that in eq. (|2()|) 
only the following combinations of a, b and «j parameters appear in the free energy term: 

x = a — bao and y = ba,2 (23) 

We then deduce from the 4 constraints l)22|) the following relations among the coefficients hij (we choose 
he A, her, hoc, hAc, hxc, hoc, hAT, hAr) 

hec + h G c ~ hAT ~ h T A = (24) 
h TC - h C T + h GC - h A T = (25) 
hcA ~ h A c + h C T ~ h TC + h C G ~ h G c = (26) 

Using eq. (|2U|) we can replace the following equations by sum rules for the corresponding correlation 
coefficients: 

Tcg + r G c - T A t - T T a = -4y = 2 (T A a - T C c) (27) 

Y CT - T T c + T C g - Tta = —2y = Taa - T C c (28) 

Tca - Tac + T C t - T T c + T C g - Tgc = (29) 

The above equations are well satisfied (within < 5%) by the experimental data, see Table 3 of therefore 
we conclude that our parametrization (|20j) for the correlation function is satisfactory and we can carry on 
our analysis. 

Consider the following differences of the correlation coefficients: Tct — Ttc, Ttt — Tec and Tat — Tec- 
Inserting the theoretical expression (|2()|) of Tij, one gets for each of the three differences: 

Tct — Ttc = Zct-tc b + hcT — hxc (30) 
Ttt — Tec = Ztt-cc b + hxr — hec (31) 
Tat - Tgc = Zat-gc b + hAT - hec (32) 

where the coefficients Z are functions of the free energies AG . Summing up the three above equations, 
one gets that the l.h.s. is vanishing, due to eq. (|17|) and the equality of the correlation coefficients for 
complementary dimers, which implies, using eq. ()25|) . that the coefficients Z are related: 

Zct-tc + Ztt-cc + Zat-gc = (33) 

Let us emphasize that this relation is biological species independent, by virtue of eq. ()25|) valid for each 
biological species, and by the complementarity rule for T^. 

Note also that relation (|33|) is automatically satisfied when plugging the theoretical expressions of the free 
energies of the dimers (i.e. in terms of the parameters ao, a\ and c^). 
Analogously using eq. (|26|) and the complementarity rule we get 

ZcA-GT + ZcT-GA + Z C G-GC = (34) 



Note that eq. (|27|) is satisfied identically from the parametrization (|20j) and the constraint (|2l 
We report in Table El and Table the values of the coefficients Z and their sum, calculated with the 
experimental free energies given by the different authors (see table EJ). As it can be seen most of the values 
of the sums are quite close to zero, except for Breslauer, SantaLucia and Sugimoto. 
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Table 3: Values of the coefficients Z of eq. (|33j) . 





Gotoh 


Vologodskii 


Breslauer 


Delcourt 


SantaLucia 


Sugimoto 


Unified 


Benight 


ZcT- 


-TC 


-0.123 


-0.115 


0.133 


0.060 


-0.498 


0.125 


0.027 


0.005 


Ztt~ 


-cc 


0.318 


0.220 


0.492 


0.160 


0.268 


0.375 


0.318 


0.080 


Zat- 


-GC 


-0.285 


-0.205 


0.145 


-0.180 


-0.560 





-0.155 


0.015 


sum 




-0.090 


-0.100 


0.770 


0.040 


-0.790 


0.500 


0.190 


0.100 








Table 4: 


Values of the coefficients Z of eq. 










Gotoh 


Vologodskii 


Breslauer 


Delcourt 


SantaLucia 


Sugimoto 


Unified 


Benight 


ZcA- 


-AC 


-0.013 


0.025 


1.013 


-0.110 


0.358 


0.425 


-0.077 


0.405 


ZcT- 


-TC 


-0.123 


-0.115 


0.133 


0.060 


-0.498 


0.125 


0.027 


0.005 


ZcG- 


-GC 


0.035 


0.010 


0.995 


0.010 


-0.300 


0.850 


-0.110 


0.150 


sum 




-0.100 


-0.080 


2.140 


-0.040 


-0.440 


1.400 


-0.160 


0.560 



5 Conclusions 

We have proposed a 3 parameter formula to fit the free energy for the DNA dinucleotides and derived a 
set of sum rules. We have compared the theoretical values with the experimental data of seven authors 
as well as their averaged value. The results of the fits reported in Tables [7| and |^1 show in the average a 
satisfactory agreement, except for Breslauer. On the basis of the above comparison, we cannot make any 
statement on the reliability of the different experimental data. In order to support our analysis by general 
theoretical arguments, we postulate a relation between the free energy and the dimer correlation function 
eq. (|20|) , which has theoretical motivation from statistical mechanics as well as experimental motivation from 
the analysis of the computed correlation function. Our postulated equation satisfies the identity that the 
sum of correlation functions has to satisfy by definition. From consistency equation, we derive a set of 
sum rules for the correlation functions which are well satisfied by the computed values for several biological 
species. This analysis supports the validity of our relation eq. (|2()|) . which allows us to perform biological 
independent consistency checks, which is remarkably verified by our theoretical formula. We have checked 
which set of experimental data satisfy the consistency relations. The result is that the data of |Sj, and 
jllj are not consistent. Therefore we disagree with the conclusions of 2 . The results of our analysis are 
more close to the ones of [Q. 
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Table 5: Dimer representation content. 



dimer 


Jh 


Jv 


Jm 


J3V 


dimer 


Jh 


Jv 


J3H 


J3V 


CC 


1 


1 


1 


1 


GC 


1 


1 


1 





CT 





1 





1 


GT 





1 








CG 


1 





1 





GG 


1 


1 


1 


-1 


CA 














GA 





1 





-1 


TC 


1 


1 





1 


AC 


1 


1 








TT 


1 


1 


-1 


1 


AT 


1 


1 


-1 





TG 


1 











AG 


1 


1 





-1 


TA 


1 





-1 





AA 


1 


1 


-1 


-1 



Table 6: Experimental absolute values for dimer free energy parameters AG (in kcal/mol). 





Gotoh 

M 


Vologodskii 




Breslauer 

El 


Delcourt 

M 


SantaLucia 


Sugimoto 

HU 


Unified 

H2| 


Benight 

1x21 


AA/TT 


0.43 


0.89 


1.66 


0.67 


1.02 


1.20 


1.00 


0.91 


AT/TA 


0.27 


0.81 


1.19 


0.62 


0.90 


0.90 


0.88 


0.83 


TA/AT 


0.22 


0.76 


0.76 


0.70 


0.90 


0.90 


0.58 


0.68 


CA/GT 


0.97 


1.37 


1.80 


1.19 


1.70 


1.70 


1.45 


1.54 


GT/CA 


0.98 


1.35 


1.13 


1.28 


1.43 


1.50 


1.44 


1.25 


CT/GA 


0.83 


1.16 


1.35 


1.17 


1.16 


1.50 


1.28 


1.28 


GA/CT 


0.93 


1.25 


1.41 


1.12 


1.46 


1.50 


1.30 


1.30 


CG/GC 


1.70 


1.99 


3.28 


1.87 


2.09 


2.80 


2.17 


1.87 


GC/CG 


1.64 


1.96 


2.82 


1.85 


2.28 


2.30 


2.24 


1.86 


GG/CC 


1.22 


1.64 


2.75 


1.55 


1.77 


2.10 


1.84 


1.85 



Table 7: Fitted absolute values for dimer free energy parameters AG (in kcal/mol). 





Gotoh 


Vologodskii 


Breslauer 


Delcourt 


SantaLucia 


Sugimoto 


Unified 


Benight 


AA/TT 


0.46 


0.92 


1.13 


0.75 


1.08 


1.11 


0.93 


0.85 


AT/TA 


0.30 


0.79 


0.93 


0.65 


0.91 


0.93 


0.78 


0.81 


TA/AT 


0.30 


0.79 


0.93 


0.65 


0.91 


0.93 


0.78 


0.81 


CA/GT 


1.02 


1.40 


1.94 


1.26 


1.57 


1.75 


1.51 


1.36 


GT/CA 


1.02 


1.40 


1.94 


1.26 


1.57 


1.75 


1.51 


1.36 


CT/GA 


0.85 


1.27 


1.73 


1.16 


1.40 


1.57 


1.35 


1.33 


GA/CT 


0.85 


1.27 


1.73 


1.16 


1.40 


1.57 


1.35 


1.33 


CG/GC 


1.73 


2.01 


2.94 


1.88 


2.24 


2.57 


2.25 


1.90 


GC/CG 


1.73 


2.01 


2.94 


1.88 


2.24 


2.57 


2.25 


1.90 


GG/CC 


1.25 


1.61 


2.34 


1.57 


1.73 


2.03 


1.78 


1.81 



8 



